Chinese Term Extraction from Web Pages Based on Compound Term Productivity
نویسندگان
چکیده
In this paper, we propose an automatic term recognition system for Chinese. Our idea is based on the relation between a compound word and its constituents that are simple words or individual Chinese character. More precisely, we basically focus on how many words/characters adjoin the word/character in question to form compound words. We also take into account the frequency of term. We evaluated word based method and character based method with several Chinese Web pages, resulting in precision of 75% for top ten candidate terms.
منابع مشابه
Chinese Term Extraction from Web Pages Based on Compound word Productivity
In this paper, we propose an automatic term recognition system for Chinese. Our idea is based on the relation between a compound word and its constituents that are simple words or individual Chinese character. More precisely, we basically focus on how many words/characters adjoin the word/character in question to form compound words. We also take into account the frequency of term. We evaluated...
متن کاملمدل جدیدی برای جستجوی عبارت بر اساس کمینه جابهجایی وزندار
Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...
متن کاملChinese-English Term Translation Mining Based on Semantic Prediction
Using abundant Web resources to mine Chinese term translations can be applied in many fields such as reading/writing assistant, machine translation and crosslanguage information retrieval. In mining English translations of Chinese terms, how to obtain effective Web pages and evaluate translation candidates are two challenging issues. In this paper, the approach based on semantic prediction is f...
متن کاملSemi-Supervised Lexicon Mining from Parenthetical Expressions in Monolingual Web Pages
This paper presents a semi-supervised learning framework for mining Chinese-English lexicons from large amount of Chinese Web pages. The issue is motivated by the observation that many Chinese neologisms are accompanied by their English translations in the form of parenthesis. We classify parenthetical translations into bilingual abbreviations, transliterations, and translations. A frequency-ba...
متن کاملImproving Translation of Unknown Proper Names Using a Hybrid Web-based Translation Extraction Method
Recently, we have proposed several effective Web-based term translation extraction methods exploring Web resources to deal with translation of Web query terms. However, many unknown proper names in Web queries are still difficult to be translated by using our previous Web-based term translation extraction methods. Therefore, in this paper we propose a new hybrid translation extraction method, w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004